Acoustic detection of multiple birds in environmental audio by Matching Pursuit

نویسندگان

  • Dan Stowell
  • Mark D. Plumbley
چکیده

We describe a submission to the ICML 2013 Bird Challenge, in which we explore the use of sparse representations as an advance on the standard technique of cross-correlation template matching in time-frequency representations. The Matching Pursuit algorithm is used to represent the signal as a sparse set of activations of templates derived from the challenge training audio. Given an audio recording, it is a challenging task to detect automatically which bird species are represented, and a task that is relevant to practical applications in bioacoustics (Stowell & Plumbley, 2010). Recent research developments go beyond single-label classification and can identify multiple species simultaneously present in a recording (Briggs et al., 2012), or track multiple birds through an audio scene (Stowell & Plumbley, submitted). The ICML 2013 Bird Challenge stimulates developments in the field by challenging researchers to identify algorithmically which of 35 bird species are present in a public dataset of 90 audio recordings. The present note describes a contribution to the challenge which explores the use of sparse representations in a multi-label classification problem. In signal processing, a sparse representation is recovered by assuming that the signal is composed from some “dictionary” of atomic elements, with only a small number of those elements being active (nonzero) for any given signal of interest (Plumbley et al., 2010). This approach is motivated by the discovery that neural coding often makes use of such sparsity, and also by the engineering prospect of representing signals in highly compact form. Sparse representations are curwww.kaggle.com/c/the-icml-2013-bird-challenge Proceedings of the 30 th International Conference on Machine Learning, Atlanta, Georgia, USA, 2013. JMLR: W&CP volume 28. Copyright 2013 by the author(s). rently the subject of much research activity, and have been used in audio and music for tasks such as audio compression and transcription (Plumbley et al., 2010). Our submission explores sparse representation to improve on the common technique of cross-correlation template matching in time-frequency representations (such as spectrograms). In the standard crosscorrelation scenario, we have one or more templates per species, and each template is separately crosscorrelated against the spectrogram in question. Peaks in the cross-correlation function are taken as detections for the corresponding species. However, when there is a large number of species to be detected, and some of these potentially have very similar templates, there is a problem: a single region of energy in the spectrogram (e.g. a single birdsong syllable) could independently match against multiple templates, giving spurious detection of many species from a single sound. Sparse decompositions can overcome this, by finding a representation of the signal as a sum of activations from all elements considered together as a dictionary.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Review on Signal Decomposition Techniques

Analysis of audio and musical information signals deals with the decomposition into atoms. This subject is most interesting and useful for the researchers who want to invent the inherent properties of the signal under decomposition and to construct a new version of it. Many algorithms have been proposed for the decomposition of audio and musical content and methodologies have been demonstrated ...

متن کامل

Harmonic decomposition of audio signals with matching pursuit

We introduce a dictionary of elementary waveforms, called harmonic atoms, that extends the Gabor dictionary and fits well the natural harmonic structures of audio signals. By modifying the “standard” matching pursuit, we define a new pursuit along with a fast algorithm, namely the Fast Harmonic Matching Pursuit, to approximate N-dimensional audio signals with a linear combination of M harmonic ...

متن کامل

Automatic Transcription of Polyphonic Musical Signals with Linear Matching Pursuit

Automatic Transcription of Polyphonic Musical Signals with Linear Matching Pursuit By Andrew McLeod The Harmonic Matching Pursuit (HMP) algorithm has offered promising results in the automatic transcription of audio signals. It works by decomposing the given signal into a set of harmonic atoms, and then grouping those atoms into individual notes. HMP has shown very promising results, but more r...

متن کامل

Environmental Sound Recognition With Time-Frequency Audio Features

The paper considers the task of recognizing environmental sounds for the understanding of a scene or context surrounding an audio sensor. A variety of features have been proposed for audio recognition, including the popular Mel-frequency cepstral coefficients (MFCCs) which describe the audio spectral shape. Environmental sounds, such as chirpings of insects and sounds of rain which are typicall...

متن کامل

Environmental sniffing: robust digit recognition for an in-vehicle environment

In this paper, we propose to integrate an Environmental Sniffing [1] framework, into an in-vehicle hands-free digit recognition task. The framework of Environmental Sniffing is focused on detection, classification and tracking changing acoustic environments. Here, we extend the framework to detect and track acoustic environmental conditions in a noisy-speech audio stream. Knowledge extracted ab...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013